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Abstract 



The rational development of specific inhibitors for the ~500 protein kinases encoded in the 
human genome is impeded by poor understanding of the structural basis for the activity 
and selectivity of small molecules that compete for ATP binding. Combining classical dy- 
namic simulations with a novel ab initio computational approach linear-scalable to molec- 
ular interactions involving thousands of atoms, we have investigated the binding of five 
distinct inhibitors to the cyclin-dependent kinase CDK2. We report here that polarization 
and dynamic hydrogen bonding effects -so far undetected by crystallography- affect both 
their activity and selectivity. The effects arise from the specific solvation patterns of water 
molecules in the ATP-binding pocket or the intermittent formation of hydrogen bonds dur- 
ing the dynamics of CDK-inhibitor interactions, and explain the unexpectedly high potency 
of certain inhibitors like 3-(3H-Imidazol-4-ylmethylene)-5-methoxy-l,3-dihydro-indol-2-one 
(SU9516). The Lys89 residue in the ATP-binding pocket of CDK2 is observed to form 
temporary hydrogen bonds with the three most potent inhibitors. This residue is replaced 
in CDK4 by Thr89, whose shorter side-chain cannot form similar bonds, explaining the 
relative selectivity of the inhibitors for CDK2. Our results provide a generally applica- 
ble computational method for the analysis of biomolecular structures, and reveal hitherto 
unrecognized features of the interaction between protein kinases and their inhibitors. 
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1 Introduction 



Approximately 500 different protein kinases are encoded in the human genome [1]. The 
similarity of the mechanism and structure of their catalytic domains remains a major 
obstacle to the rational development of specific inhibitors for the treatment of human 
diseases ranging from cancer to auto-immunity. One important case in point involves the 
family of cyclin-dependent kinases (CDKs), members of which are essential for progression 
through different stages of the cell division cycle in all eukaryotes [2]. In particular, one 
family member, CDK2, is required for the events that lead to DNA replication [3-5]. CDK2 
and its associated cyclins are over-expressed in cancer cells, and might contribute to their 
deregulated growth [6]. Therefore, inhibition of CDK2 through insertion of small molecules 
into its ATP-binding pocket has long been a potential target for cancer therapies [6-9] . A 
number of inhibitor molecules have already been designed for this purpose, some of which 
are currently in clinical trials (see e.g. Refs. [7,9] and references therein). The clinical 
efficacy of these inhibitors critically depends not only on their potency -that is, their ability 
to bind to CDK2 more stably than ATP- but also on their selectivity for CDK2 over other, 
highly homologous members of the CDK family. Non-selective inhibition carries the risk 
of undesired and potentially toxic side-effects [10]. Thus, along with potency, selectivity 
is a crucial issue in inhibitor design [9,11,12]. However, although the crystallographically 
determined structures of inhibitors bound to CDK2 have been used as a basis for the 
rational development of several different inhibitors, features of their potency in CDK2 
inhibition, or their selectivity for CDK2 over the closely related kinase CDK4, remain to 
be explained. 

We address here the problem of computing relative potencies and explaining the selec- 
tivity of five CDK2 inhibitors using an approach that combines classical dynamic simulation 
with novel ab initio calculations linear-scalable to molecular interactions involving thou- 
sands of atoms [13,14]. First principles quantum techniques enable us to calculate binding 
energies of hydrogen-bonded systems to high accuracy, while classical dynamical simula- 
tions allow access to large regions of the potential energy surface and long time scales. As 
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a reference, we also perform a series of extensive docking simulations and scoring functions 
binding energy calculations. 

Our study is based on the crystallographically determined structures of inhibitors bound 
to CDK2. These provide a clear picture of the binding features between the ligand and 
those residues that constitute the binding pocket, revealing the dominant interactions to be 
hydrogen bonding, electrostatic and van der Waals forces. Potent lead compounds for new 
CDK inhibitors have been readily developed by taking advantage of these interactions via 
analysis of the so-called structure-activity relationship [12,15-18]. In addition, computer- 
aided approaches such as database [19], docking [20-22] and scoring function methods [23] 
provide an effective way of probing the binding modes and testing the binding strength of 
large arrays of molecules, and thus help identify potential new lead compounds. Moreover, 
molecular dynamics simulations have proven very useful for analysis of the binding modes 
and the structural rearrangements which are connected with the inhibition or activation 
of CDKs [24-27]. In this work, we extend these techniques to encompass the use of first 
principles methods, which permit calculation of hydrogen-bond strengths at an accuracy 
of about 1 kcal/mol [28], and do not depend on any external set of empirical parameters. 

A number of structural studies have demonstrated that the measured potency of in- 
hibition can be directly correlated with the strength of the local protein/inhibitor inter- 
actions within the ATP binding pocket (see e.g. [9] and references therein). This idea 
has lead to the rational development of inhibitors via addition of functional groups de- 
signed to encourage specific interactions, such as hydrogen bonds, with promising residues. 
Most inhibitors bind to CDK2 in a fashion similar to the adenine ring of ATP, forming 
a triplet of hydrogen bonds to the peptide backbone of residues Glu81 and Leu83 which 
reside in the hydrophilic hinge region at the back of the binding pocket [29]. With the 
goal of increasing the protein-inhibitor interactions, specific functional groups have been 
added to 0^-cyclohexylmethylguanine (NU2058) [29] to develop its more potent variant 
0®-cyclohexylmethoxy-2-(4'-sulfamoylanilino)purine (NU6102) [30], and to 2,6-diamino-4- 
cyclohexylmethyloxy-5-nitrosopyrimidine (NU6027) [29] to obtain a potent carboxamide 
derivative (the 9d variant in [17]) (see Figure 1). Structural investigations demonstrated 
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that the higher potency of these two variants is due to formation of additional interac- 
tions with the polar residue Asp86, while the "standard" triplet of hydrogen bonds is 
retained [17,30]. A puzzling exception is 3-(3H-Imidazol-4-ylmethylene)-5-methoxy-l,3- 
dihydro-indol-2-one (SU9516) which appears to form only the standard hydrogen bond 
triplet but is highly potent and also presents a good degree of selectivity for CDK2 against 
CDK4 [31,32]. Unlike potency, the issue of selectivity has proven to be much harder to 
explain via analysis of the available crystal structures only [11, 12]. In the particular case 
of SU9516, the observed selectivity had previously been proposed to arise from an inter- 
action with the CDK2 specific residue Lys89 [31], but no such interaction was found in a 
recent crystallographic study [32]. Indeed, the Lys89 residue was also a target for the two 
derivatives mentioned above, NU6102 and 9d-NU6027, but again no such interaction has 
been observed in the crystallographic studies performed [17,30]. 

The potency of inhibition can be quantitatively described by the so-called IC50 value, 
i.e. the concentration of inhibitor which is required to reduce the activity of the protein 
by 50 % with respect to a chosen reference state in the absence of inhibitor. We note that 
the IC50 value is dependent on the specific experimental conditions, in particular on the 
ATP concentration used in the activity assay. A more objective measure of the potency, 
which can be used to compare results from different assays, is the inhibition constant Ki. 
This is defined as the equilibrium dissociation constant for the reaction of an inhibitor 
with a protein to form an inhibitor-protein complex. Under the same assay conditions, 
and IC50 values are related via a simple proportionality relationship [33], so that often the 
values can be directly compared [17]. Therefore, both IC50 and Ki values can be assumed 
to be directly related to the free energy change associated with the inhibitor binding to 
the protein [30]. Our ultimate aim here is to reproduce the experimentally measured 
relative potencies of the five inhibitors mentioned above against CDK2 by calculation of 
the differences between their free energies of binding. 

Our calculations reveal that a static analysis of the hydrogen bond patterns formed by 
the inhibitors in the ATP binding pocket is generally not sufficient to predict the correct 
rank order of inhibitor potencies. In particular, electronic structure calculations at the DFT 
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level appear to be necessary to account quantitatively for the screening of the hydrogen 
bond interactions due to the presence of water molecules in the binding cleft. Moreover, 
dynamical simulations reveal the profound effect of the motion of certain key residues that 
are involved in hydrogen bonding within the binding pocket. Only by taking into account 
the dynamical nature of the protein/ligand interactions are we able to rationalise the 
measured potency and explain the observed CDK2 selectivity of the inhibitors considered. 



2 Free energy of binding and potency of inhibition 

The potency of an inhibitor is measured in terms of the inhibition constant Ki, which is 
inversely proportional to the association constant Ka of the reaction 

Protein (P) + Ligand (L) Protein-Ligand Complex {PL). (1) 

This process involves a change of free energy AG = —ksT In Ka^, where ks is the Boltzmann 
constant and T is the absolute temperature. Let us first split AG into gas-phase and 
solvation contributions, and the former in enthalpic and entropic contributions: 

AG = AG, + ^Gsoi. = AEg - TASg + AG.^;, , (2) 

where AGsoiv is defined as the difference Gsoiv{PL) — Gsoiv{,P) — Gsoiv{L), and AEg is 
defined as Eg{PL) — Eg{P) —Eg{L). The enthalpic term AEg can be accurately calculated 
using the DFT approach, whereas both the solvation free energy contributions AGsoiv and 
the gas-phase entropic contributions AS*, are in general not readily accessible within this 
computational scheme. However, when we are looking at the relative potencies between 
two inhibitors 1 and 2 then we are only interested in the ratio Kl/Kf. This enables us 
to look at the difference between their associated free energy gains, A AG = AG^ — AG^, 
which can be directly compared with the Ki values via the relationship: 

AG^ - AG^ = kBT\xi^ , (3) 

and can be written as follows: 

A AG = A AEg + AAG,o,.„ - TAASg. (4) 
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Following Ref. [34], we can assume that the difference between the gas-phase entropy 
changes AAS'^ is negligible within the error bar associated with the calculation of the 
remaining terms. This appears to be justified in the present case given the large similarity 
between the systems considered. 

Within this approximation, we are left with only two terms: 

AAG = AAEg + AAGsoiv (5) 

We calculate the gas phase enthalpy differences Ai^^ within the DFT approach, and the 
solvation free energy differences AGsoiv classically, using the Generalised Born/Surface 
Aerea (GBSA) model. Continuum theories have been used to compute free energies of 
solvation with success for over 20 years [35-37], accounting for entropic and hydrophobic 
effects in a simple and computationally inexpensive manner. This allows us to avoid all the 
difficulties involved in first principles simulations of polar solvents [38-40]. On the other 
hand, we are aware that more refined formalisms could be used to treat solvation effects to 
higher accuracy [41], and in particular we will pay great attention to the specific hydration 
patterns of the PL complexes within the binding pocket, including explicit solvent water 
molecules in our final set of calculations (Section 4.1). Finally, using classical force- field 
techniques we have checked that dispersion forces, which are severely underestimated in 
the standard DFT approach, contribute in a nearly equal way to the binding energy of all 
inhibitors to CDK2, and therefore do not contribute to the free energy differences AAG 
(see Computational Details). 

3 Static binding energy calculations 

3.1 Convergence of binding energy with the system size 

Within the approximations described in the previous section, we are in principle able to 
compute the relative differences of binding free energy AAG, provided that we succeed in 
calculating the total energy (enthalpy) of the protein and the protein/ligand complexes. 
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Inclusion of the whole CDK protein in our calculation would be prohibitively expensive 
in terms of computer time, but this is unnecessary in the present case. Since the small 
inhibitors bind to a well defined and spatially limited region of the protein (the ATP 
binding pocket), we are able to consider only those amino acid residues which are close 
enough to the inhibitor to significantly contribute to the binding energy. In doing this we 
must strike a delicate balance between the gain in accuracy from using a DFT approach 
and the loss of a realistic model due to limiting the size of the systems considered. In 
order to select the smallest possible fragment of CDK2 which ensures that all important 
interactions in the pocket are taken into account we tested the convergence of AEg with 
respect to increasing fragment size. 

The inhibitor used in this convergence test is staurosporine (inhibitor 7 in Figure 1). 
Due to its large size, the calculated binding energy of staurosporine to CDK2 is expected 
to be strongly influenced by long-range electrostatic interactions, and hence to be very sen- 
sitive to the size of the system used to model the binding pocket. Our starting point is the 
available crystal structure of staurosporine in complex with CDK2 (PDB key lAQl) [42]. 
As a first model, we consider staurosporine surrounded by only those amino acids with 
which it makes direct hydrogen bonds (E81, L83, D86 and Q131). Since the side chains of 
E81 and L83 do not directly interact with the inhibitor, they are replaced by methyl groups 
bound to the respective Cq, (Figure 2). Keeping the Cq, atoms of the protein backbone 
constrained in their original x-ray positions, we fully minimise the geometry of the system 
using the DFT approach (see Computational Details). This results, in particular, in the 
accurate estimation of the hydrogen bond lengths and angles between the inhibitor and the 
protein. Three larger models are then constructed, adding to the minimised small system 
all amino acids within 5.0 A, 7.0 A, and 10.0 A from the inhibitor (Figure 2). The binding 
energy is then calculated for all models as the difference in total energy between the pro- 
tein/inhibitor complex, the isolated protein model, and the isolated inhibitor. With the 
exception of the smallest model, we choose not to minimise the geometry of the systems, 
for the following reasons. Firstly, the potential energy surface of a large number of amino 
acids is expected to be very fiat and contain a large number of local minima so that (i) 
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the minimisation would take a long time and would be very expensive to perform at the 
quantum level of accuracy, and (ii) even after full minimisation it is very hard to know 
whether the correct global minimum has been found. Secondly, in principle we should min- 
imise both the protein/inhibitor and the isolated protein models. However, in this case the 
binding energy would be affected by differences in the relative positions of amino acids far 
away from the inhibitor, with the risk of wrongly estimating the binding energy because 
of the aforementioned difficulties in finding the global energy minimum. Therefore, we 
assume here that the available crystal structures represent an average configuration of the 
amino acids reliable enough to compute long range interactions to an accuracy within the 
error bars associated with all other approximations. Binding energies of systems which are 
fully relaxed using the classical force field approach will be presented in Section 5. 

The calculated binding energy values using both the Siesta and the Onetep codes are 
reported in Figure 2 alongside the model systems used. We find that the binding energy 
converges to within 0.15 kcal/mol for the third model of the binding pocket, i.e. when 
all amino acids at a distance of 7 A from the inhibitor are explicitly considered. In the 
next section, models of this size will be used to compute the binding energy of all other 
inhibitors considered using the 0(N) DFT approach (see Computational Details). As a 
further validation test of the 0(N) DFT techniques employed, we have calculated binding 
energy values for the first model of Figure 2 using also the traditional plane- wave formalism. 
The computed binding energies obtained with the Castep, Siesta and Onetep codes 
are -32.0 kcal/mol, -32.2 kcal/mol and -32.4 kcal/mol, respectively. This confirms that 
both 0(N) DFT approaches faithfully reproduce with equivalent accuracy the results of 
the traditional PW implementation. 

3.2 Static calculations using the available cystal structures 

As mentioned in the Introduction, we consider in our study the inhibitors NU2058 (1), 
NU6027 (2), NU6102 (3), the 9d- variant of NU6027 (4), and SU9516 (5) (Figure 1) (for 
simplicity, inhibitors are referred to by their associated number in the following text). 
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Experimentally determined values of Ki are available for the inhibitors 1, 2 [29], 3 [30], 
and 5 [32]. Moreover, IC50 values for 1, 2, 3, and 4 have been determined in the same 
assay [17]. Therefore we are able to obtain an estimate of the Ki value for 4 using the 
proportionality relationship between IC50 and [33]. The experimental Kj values of all 
these inhibitors are reported in Table 1. 

To investigate the interactions between the inhibitors and the protein, we start from 
the crystallographic determined structures of the PL complexes which are available in the 
RCSB Protein Data Bank. Here one has a choice between structures where the inhibitors 
are bound to the active form of CDK2, i.e. the Thrl60-phosphorylated CDK2/cyclin 
A complex, or structures where they are bound to the inactive form, i.e. monomeric 
CDK2. A major structural difference between the two forms lies in the availability (in the 
inactive form) vs. unavailability (in the active form) of the K33 residue within the binding 
pocket. It has been suggested that, when available, K33 strongly interacts with the natural 
ATP ligand in a way which suppresses its turnover [4,5]. Therefore, the binding energy 
of inhibitors to the inactive form of CDK2 is expected to be affected by the presence 
of K33 in the binding pocket. Namely, the presence of the K33 ligand in the binding 
pocket would result in spurious interactions which are in fact absent in the actual activity 
assays from which the Ki values are extracted. Therefore, since our aim is to present a 
consistent comparison of the binding energies of the inhibitors related to their inhibition 
activity, we choose to consider only the interactions between the inhibitors and the binding 
pocket of CDK when K33 is not available. Indeed, the active CDK2/cyclin A complex is 
increasingly used as the reference structure in structure-activity-relationship investigations 
of drug activity [30]. 

In the RCSB database, structures are available for 1, 3 and 4 bound to the Thrl60- 
phosphorylated cyclin-A/CDK2 complex (PDB keys are IHIP, IHIS, lOGU, respectively), 
and for 2 and 5 bound to the monomeric CDK2 form (PDB keys are lElX and 1PF8, 
respectively). In order to prevent the calculated binding energies from being affected by 
strong spurious interactions with the polar residue K33, this has been substituted with an 
alanine residue in the two latter cases. In the available crystal structures, all inhibitors 
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are observed to form a triplet of hydrogen bonds with E81 and L83. In addition, D86 is 
observed to form bonds with 3 and 4, which leads to their increased potency. 

Calculations of the binding energies of all inhibitors are performed as described in 
the case of staurosporine (Section 3.1), considering all amino acids within 7 A from each 
inhibitor. Namely, the hydrogen bond distances are optimised using the PW-DFT approach 
considering only those amino acids that are directly bound to the inhibitors in the crystal 
structure. Then this optimised model is embedded in the larger pocket model, and the 
binding energy Ai?^ is calculated using the 0(N) DFT approach without any further 
geometry optimisation. The obtained binding energy values are reported in Table 1 along 
with the values of the solvation free energy differences ^Gsoiv computed within the GBSA 
model. From these values we are able to compute the relative free energy differences 
between the inhibitors, AAG, using equation 5, where we take as a reference the binding 
energy value of 1, which is the least potent inhibitor considered. As a comparison, we report 
in Table 2 the values of binding energy differences calculated with three different sets of 
scoring functions after extended docking simulations, as described in the Computational 
Details section. 

The results of the docking simulations appear to be roughly consistent with the mea- 
sured rank of potencies (except in the case of inhibitor 2) . However, quantitative agreement 
between the measured and calculated binding energies could not be achieved, the values of 
AAG being consistently underestimated. On the other hand, the agreement between the 
relative energies of binding calculated from static structures within the DFT formalism 
and the experimentally determined relative potencies is worse. In particular, the binding 
energies of inhibitor 3 is highly overestimated with respect to the binding energy of in- 
hibitor 1. Furthermore, the wrong rank order of potency is predicted for inhibitor 2 and, 
more dramatically, for 5. We note that the binding energy of 5 is far too small even when 
only the direct hydrogen bond interactions are considered. In this case a strong interaction 
(e.g. a direct hydrogen bond) seems to be missing in the model considered. However, the 
recently resolved x-ray structure of this inhibitor bound to CDK2 [32], from which our 
model was constructed, reveals no interactions other than the usual triplet of hydrogen 
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bonds with the backbone of E81 and L83. 

In general, the inconsistencies between the measured and calculated values are too large 
to be due to approximations such as the neglection of gas-phase entropic contributions, or 
the limited size of the models considered. Rather, these results suggest that the precise 
solvation patterns of the PL complexes need to be explicitly considered, and a more careful 
analysis of the potential energy surface of the binding modes needs to be performed. In 
the next section we tackle these issues in detail by performing a dynamical analysis of the 
binding modes of the inhibitors bound to the active form of CDK2 in the explicit presence 
of water solvent molecules. 

4 Dynamical force field simulations 

Our investigations start with a dynamical analysis of the isolated CDK2 protein in ex- 
plicit water solvent (PDB key IHCL) and bound to its natural ligand, ATP (PDB key 
IHCK) [43]. After minimisation and equilibration as described in the Computational De- 
tails section, we perform a 0.5 ns run for each of these two systems. We find that the 
TIP3P water solvent and the employed force field are able to give the correct hydration 
shell for residues in the interior of the protein. This is immediately visible from a com- 
parison between the positions of the crystallisation water present in the crystal structure 
and the positions of water molecules with long residence time around selected amino acids 
(as an example, see Figure 3 (a, b)). We stress that the inputs for the dynamical simu- 
lations were prepared stripping all crystallisation water molecules from the original PDB 
files. Hence, this also demonstrates that 0.5 ns is a sufficient period of time to allow water 
molecules to diffuse into the inner cavities of the protein and give the correct hydration 
pattern. In the absence of the ATP ligand, water molecules are found to form a nearly 
planar hydrogen-bonded network inside the binding pocket (Figure 3 (c)). This is consis- 
tent with the strong hydrophilicity of the back region of the binding pocket in an otherwise 
hydrophobic protein environment. The adenine ring of ATP binds into the binding pocket 
of CDK2 via two hydrogen bonds between the Nl and N6 atoms and the residues L83 
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and E81, respectively, as expected from previous experimental and theoretical investiga- 
tions [25,26,43] (Figure 3 (a, b)). These hydrogen bonds are found to be stable and 
present for the whole 0.5 ns simulation. We then perform dynamical simulations of about 
1 ns for inhibitors 1 and 2, and of about 4 ns for inhibitors 3, 4 and 5. As mentioned 
above, there is no available crystal structure of inhibitors 2 and 5 bound to the active 
CDK2/cyclin A complex. Therefore, in these two cases we constructed the initial inputs 
by placing the inhibitors in the binding cleft of the CDK2/cyclin A model extracted from 
its x-ray structure in complex with inhibitor 1. In agreement with the crystal structures, 
all inhibitors are found to form hydrogen bonds in the hydrophilic hinge region at the back 
of the binding pocket. These bonds are present for the entirety of the simulations, and the 
average bond distances are in good agreement with the optimised bond lengths obtained 
after full geometry minimisation within the DFT approach (Figure 4). Inhibitors 2, 3 and 
4 remain bound to the O atom of E81, and the N and O atoms of L83. In the case of 
inhibitor 1, however, after ~35 ps of dynamics at 300 K the initial hydrogen bond with 
the O atom of L83 breaks and is immediately replaced by a new hydrogen bond with the 
nearby lying O atom of H84 (Figure 5). This binding mode is mantained for the rest of 
the simulation, which is stopped after 1 ns. Inhibitor 5 is also observed to form two strong 
hydrogen bonds with the O atom of E81 and the N atom of L83, but does not form a 
strong hydrogen bond with the O atom of L83. Indeed, the donor-H-acceptor angle of this 
bond is ~101°, which indicates the presence of a weaker interaction. In addition, the O 
atom of L83 interacts via hydrogen bonding with the CI atom of the inhibitor, while an 
internal hydrogen-bond between the Nl atom and the 01 of the inhibitor is present in the 
minimised structure of the protein/inhibitor complex (Figure 5). 

Interestingly, hydrogen bonds formed with residues other than E81 and L83 are found 
to be considerably less stable during the dynamics. More specifically, the donor-acceptor 
distances show an intermittent behaviour as the dynamics evolves, indicating that these 
bonds can be reversibly formed and broken at room temperature. In particular, the hy- 
drogen bonds donated by inhibitor 3 to the carboxylic group of the D86 residue, which are 
visible in the crystallographic structure of the CDK2/inhibitor complex, are in fact found 
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to be present for only ~21% of the simulation time (Figure 6 and Table 4). In contrast, the 
corresponding interaction between D86 and inhibitor 4 is present for the whole simulation 
time. However, we observe that D86 binds to the inhibitor alternatively switching between 
the ODl and the 0D2 atoms of its carboxylic group (Figure 6). Furthermore, we find that 
additional hydrogen bonds, which were not indentified in the available x-ray structures, 
can temporarily form during the dynamics. One dramatic example is the bond which we 
see forming between inhibitor 5 and the NH3' group of K89. In the crystal structure [32], 
the distance between the NZ atom of K89 and the N2 atom of the inhibitor is 7.1 A. After 
about 0.3 ns of simulated dynamics at room temperature, however, this distance suddenly 
decreases to about 3 A, indicating the formation of a hydrogen bond (Figure 7). This 
bond is then repeatedly broken and reformed as the dynamics proceeds, and is present 
for ~20% of the time over a total simulated time of more than 4 ns (see next section). 
In an analogous fashion, K89 is observed to form intermittent hydrogen bonds also with 
inhibitors 4 and 5 (Figure 6). Intermittent hydrogen bonding appears thus to be a key 
element of the interaction between CDK2 and its inhibitors, and need to be taken into 
account when calculating the binding energies of the protein/inhibitor complexes. 

4.1 Calculation of binding energies from the AMBER structures 

In order to account quantitatively for the intermittent hydrogen bonds, we first perform 
a full structural minimization within the classical force field starting from a snapshot 
of the dynamics when all hydrogen bonds are present at the same time. A model is then 
extracted from the minimised structure, including only the inhibitor and those amino acids 
which are in direct hydrogen-bond contact with the inhibitor. As done in the case of the 
models extracted from the crystal structures, the side chains of the selected amino acids are 
replaced by methyl groups whenever they do not participate in any hydrogen bond. The 
geometry of this small inhibitor/pocket model system (HB model) is fully minimised using 
the PW DFT approach, keeping the atoms of the backbone fixed. Finally, the binding 
energy of the inhibitor to the small system is calculated as in Section 3.2 (A£^g(HB) values 
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in Table 3). This minimal pocket model is then embedded in a larger model comprising 
all amino acids and water molecules within 7.0 A from the inhibitor, in the geometry 
obtained after full minimisation within the force field potential. An additional force field 
minimisation of this larger pocket model is performed keeping fixed the Ca atoms and all 
atoms of the smaller HB model. This is necessary to relax any possible steric clash between 
the added amino acids and water molecules and the elements of the HB model after the 
minimisation at the DFT level [44]. After this minimisation, the binding energy within 
the larger pocket model is calculated as the difference in the DFT total energy between 
this system, the isolated inhibitor, and the isolated pocket model, without relaxing the 
geometries of the individual components (A£'g(7A) values in Table 3). 

We note that all values of AEg{7 A) are smaller than the corresponding values of 
A£'g(HB). This indicates that the direct hydrogen bond interactions between the inhibitors 
and the protein are substantially screened by the presence of the surrounding amino acids 
and solvation water molecules. Interestingly, when all water molecules are stripped off 
from the 7 A model, the computed binding energies {AEg{7 A dry) in Table 3) are slightly 
larger than the AEg(H.B) values, as found in section 4.2 for the systems extracted from the 
crystal structures. Furthermore, in the case of inhibitor 5, we computed the binding ener- 
gies of systems composed of all amino acids of the HB model and all the water molecules 
within 4.0 A, 6.0 A, and 7.0 A from the inhibitor. The computed AE'g values (in kcal/mol) 
for the HB model and these three systems are -50.3, -50.0, -49.6, -49.1, indicating that 
the presence of water molecules alone does not contribute substantially to the observed 
screening of the interactions passing from the HB to the 7 A systems. The combination of 
these two results leads to the hypothesis that the observed screening effect must arise from 
the simultaneous presence of both the surrounding amino acids and the solvation water 
molecules. 

Indeed, the binding energy of inhibitor 5 calculated adding to the HB system residues 
10 and 86, as well as the backbone of residues 84 and 85, is -44.1 kcal/mol, while the 
binding energy value of the same system after adding a single water molecule which bridges 
the NH^"^ group of K89 and the peptide group between residues 84 and 85 (Figure 8) is 
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decreased dramatically to -39.1 kcal/mol. This drop in binding energy is associated with a 
substantial change in the polarisation of the hydrogen bonds upon inclusion of the further 
protein residues and the water molecule, as can be seen by analysis of the atomic charges 
which best fit the electrostatic potential computed from first principles. Namely, calculated 
atomic charges on the acceptor N atom of 5 changes from -0.23 in the HB model to 0.01 in 
the presence of the additional residues, to 0.10 after adding the water molecule, while the 
charge on the donor H atom of K89 changes from 0.19 to -0.02 to -0.26. This is indicative 
of profound changes in the local electrostatics, and thus in the hydrogen bond strength, 
due to rearrangements of electronic charge in the three systems considered. This result 
highlights the importance of calculating binding energies at a quantum level of accuracy, 
and reveals water-mediated polarization effects [45] as a novel feature of the interaction 
between CDK2 and its inhibitors. 

The values of the binding energy reported in Table 3 represent an upper limit to the 
actual binding energy of the system. In fact, since many of the hydrogen bonds are 
repeatedly broken and formed during the motion of the protein at room temperature, their 
contributions to the binding energy need to be "weighted" by the fraction of time for which 
they are present (Table 4). Taking as an example inhibitor 3, a total of six hydrogen bonds 
are formed: the triplet of stable bonds with E81 and L83, a pair of intermittent bonds with 
D86, and an intermittent bond with K89. Let t*" ftrt,m^ ^tn,89^ ^^^^ ^tn,86,89 ^^^e fractions 
of time for which the bonds with the triplet only, the triplet and residue 86, the triplet 
and residue 89, and the triplet, residue 86 and residue 89 are present, respectively. These 
values can be extracted from an analysis of the variation of the hydrogen-bond distances 
and angles during each simulation. For the presence of a bond we define a cut-off distance of 
4.0 A between donor and acceptor, and a minimum angle of 90° between donor, hydrogen, 
and acceptor. We stress the importance of including a condition on the hydrogen bond 
angle, since the hydrogen bond strength is considerably reduced when this angle deviates 
from the ideal linear conformation [28]. In the case of inhibitor 3 we obtain t*"'^^ = 0.13, 
^tri,89 ^ Q_^g^ g^j^^ ^tri,86,89 ^ 0.08, and t*" = 0.61. 

Now we need to split the computed values of AEg into the individual contributions 
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corresponding to each binding mode. We define Ai?*", AE"*"'^^, AE*"'*^, and Ai?*^*'^^'^^ 
as the binding energies of the inhibitor to the pocket in the presence of the hydrogen 
bond triplet only, the triplet and the residue 86, the triplet and residue 89, and the triplet 
together with both residue 86 and 89, respectively. Each of these contributions are cal- 
culated separately via static total energy calculations of small models extracted from the 
geometry of the HB system previously obtained. At this point, we make the assumption 
that the ratios between the contributions of the individual bonding modes are the same 
in the HB and in the TA pocket models. In other words, the individual contributions 
calculated taking into account only the minimal pocket model are scaled by the same fac- 
tor AEg{7A)/AEy{}lB) (0.85 in the case of inhibitor 3). Within this assumption, we can 
finally calculate the binding energy AEg as follows: 



The computed values of binding energy of the individual binding modes of all five inhibitors 
are reported in Table 5. 

Since explicit water molecules are included in our models, the values of AEg for the 
five inhibitors calculated with the procedure described above already contain the enthalpic 
contributions to the solvation of the PL and the P systems. The remaining contributions to 
the solvation free energy, i.e. the surface area contributions of the PL and P models, as well 
as the solvation free energy of the isolated ligands, are calculated within the GBSA method 
to obtain the solvation free energy differences AG^°il, where the superscript indicates that 
these values refer to the models including explicit water solvent (reported in Table 3). The 
obtained enthalpic and solvation contributions can now be used to compute a final set of 
free energy differences A AG (Table 6). The inferred Ki ratios are now found to agree with 
the experimentally measured values within one order of magnitude for all the inhibitors 
considered, including the delicate case of inhibitor 5. 



AEg 



AEgiYLB) 



^^fri^^trj _|_ ^tri,8<i ^£jtri,86 _j_ ^tri,89^^fr«,89 _|_ ^tri, 86, 89^^tr«, 86, 89^ ^g^ 
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5 Conclusions 



Binding modes and binding energies 

The main conclusion that can be drawn from our investigations is that the potency of 
inhibition of small molecules which bind to the ATP pocket of CDK2 is largely dependent 
on the dynamical nature of the protein/inhibitor interactions. In particular, the motion of 
both the inhibitor and of particular residues within the binding pocket is responsible for 
the intermittent formation and rupture of hydrogen bonds. Such effects are not easy to 
identify by using only static methods for structure investigation, such as x-ray crystallog- 
raphy. Moreover, even if the orientation of the inhibitor in the binding pocket remains the 
same, differences in the conformation of the backbone of the protein can lead to substan- 
tial differences in the magnitude of the calculated hydrogen bond strengths (cf. Tables 1 
and 3). The dynamical nature of the hydrogen bond interactions is likely to be the main 
reason for the quantitative disagreement between the experimental binding energy values 
and the values obtained using scoring functions after rigid docking simulations (see Table 2 
and Ref. [20]). Notably, the binding free energy differences calculated with this method 
are underestimated with respect to the experimental values, which is consistent with the 
neglection of additional intermittent interactions within the protein active site. On the 
other hand, the scoring functions employed are carefully parametrised to account for hy- 
drophobic and solvation effects in a simple but robust manner. Indeed, they are capable 
of giving a better qualitative estimate of the rank of potencies between the considered 
inhibitors than simplistic DFT binding energy calculations after geometry optimisation of 
protein/inhibitor complexes extracted from the crystal structures, in the absence of explicit 
solvent molecules (Table 1). 

Our results stress the importance of including a large number of amino acids in the 
models for the ATP binding pocket (see Figure 2), since long range electrostatic effects 
contribute substantially to the computed binding energies. In general, when no explicit 
water molecules are included in the calculations, the binding energy increases with increas- 
ing number of amino acids included in the calculations (Figure 2 and Tables 1 and 3), 

18 



converging within 0.15 kcal/mol when all amino acids within 7 A from the inhibitor are 
considered. A major change to the computed binding energy is due to the explicit pres- 
ence of water molecules within the binding pocket. In this case hydrogen bond interactions 
are substantially screened due to water-mediated hydrogen bond depolarisation effects, so 
that the binding energy is reduced up to ~35 % of its original value (Tables 3 and 5). 
Since relatively few water molecules can be identified by crystallographic methods, this 
stresses the importance of using MD methods to obtain a correct solvated model of the 
protein/ligand complex. Moreover, given the complex nature of the screening effects ob- 
served, this indicates that calculations of binding energies necessarily require the use of 
quantum techniques like DFT. 

The main limitation of MD methods is that the accessible simulation time is very short. 
Therefore, the fractions of time associated with each individual binding modes which enter 
into the calculations of the relative free energy of binding should be considered as indicative 
values [46]. Given this limitation, the approximations made on the solvation and entropic 
terms contributing to the binding energies appear to be less severe. While gas-phase 
entropic terms (vibrational, internal rotational and conformational) necessarily need to be 
explicitly considered in order to compute absolute values of binding free energies, this is not 
necessary when comparing relative values of binding free energies among different inhibitors 
of the same protein [34]. As mentioned in Section 3, this approximation is justified by the 
chemical similarities of the systems considered. Also the underestimation of direct van 
der Waals interactions within the DFT techniques employed appears not to be crucial 
in the present case. Dispersion forces can represent large fractions of absolute binding 
energies in biological systems [47-49], and are indeed of the order of 45 kcal/mol in our 
minimised CDK2/ligand complexes (i.e. at the maximum van der Waals contact between 
the hydrophobic and aromatic residues in the binding pocket and the ligands). However, 
when the differences of van der Waals energies are computed as averages of relatively long 
dynamic runs at 300 K, they are found to be the same for all the inhibitors considered in 
this work (within the error bar of a few kcal/mol intrinsic to the MM method), and thus 
they do not contribute substantially to the calculated AAG values. 
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As far as the calculation of solvation free energies is concerned, this is in general a 
very difficult task, requiring in principle accurate formalisms and an explicit treatment of 
solvent molecules [41]. Here we have used the widely used continuum GBSA model [37] to 
compute the solvation free energies (apart from the final set of calculations, where explicit 
water molecules are included in the PL and P models, see Section 4.1), expecting errors 
in the differences between solvation free energies calculated within the same method to be 
negligible compared with all other approximations. 

Specificity of inhibition 

Considering the approximations inherent to the methods employed and the limited statis- 
tics accessible to atomistic MD simulations, the agreement between calculated and mea- 
sured relative potencies is remarkably good. It has to be noted, however, that direct 
calculations of inhibition constants is still out of the reach of this and analogous computa- 
tional schemes. The power of our method lies in the ability of predicting and quantifying 
all interactions present between a given inhibitor and, in this case, CDK2. In particular, 
in the case of inhibitor 5 analysis of the computed hydrogen bond strengths immediately 
revealed that a strong interaction was absent from the previously considered structural 
model. With the help of dynamical simulations, this interaction has been identified in the 
form of an intermittent hydrogen bond with Lys89, which was not visible in the crystal 
structure [32] . The same residue has been observed to form similar intermittent interactions 
also with inhibitors 3 and 4. The presence of such interactions is particularly important, 
since Lys89 is a residue specific to CDK2. In CDK4, for instance, a much shorter threo- 
nine residue is present in the same position (Thr89), which is not expected to be able to 
form hydrogen bonds with small inhibitors, unless they are specifically modified for this 
purpose [50]. Therefore, there is an increase in binding energy between the inhibitors and 
the protein which is specific to CDK2, and is not expected to be present in the case of 
CDK4. This is consistent with the observed, yet so far unexplained selectivity patterns of 
inhibitors 3 [30] and 5 [32]. 
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Recent inhibitor profiles have targeted the Lys 89 residue for this very reason, placing 
a highly hydrophilic group at a favourable position [51] . However, we note that the gain 
in binding enthalpy from adding hydrophilic groups targeting Lys89 is counterbalanced by 
the penalty of an increased solvation energy. Inhibitor 5 is an interesting case, having a 
considerably lower solvation energy than other pyridine and pyrimidine based inhibitors, 
yet still being capable of interaction with LysSQ. In principle, it is also conceivable that 
Lys89 could form an intermittent hydrogen bond with the Nl atom of inhibitor 1, which 
could help explain why also in the case of this inhibitor a moderate selectivity against CDK4 
has been observed [30]. On the other hand, this selectivity could also arise from factors not 
directly connected to local interactions within the ATP binding cleft, or from more subtle 
effects, e.g. differences between the hydrophobicity patterns of the two proteins [52]. 

Summary 

The relative differences of binding free energies AAG between five inhibitors of CDK2 
have been calculated by using a combination of linear-scaling DFT techniques and clas- 
sical MD simulations. The AAG values obtained after direct structural optimisation of 
protein/inhibitor complexes extracted from the available crystal structures are found not 
to correlate well with the measured potencies of inhibition. As a comparison, extensive 
docking simulations followed by free energy calculations using scoring function methods 
provided a better qualitative estimation of the rank of potencies among the inhibitors, 
although a quantitative agreement could not be achieved. Molecular dynamics simulations 
using classical force-field techniques revealed that some of the hydrogen bonds formed be- 
tween the inhibitors and the ATP binding cleft of CDK2 are in fact intermittent, being 
repeatedly broken and formed at room temperature. These dynamical effects are taken 
into account in the 0(N) DFT binding energy calculations by weighting each interaction 
by the percentage of the time for which it is actually present. Moreover, a combined ef- 
fect due to the presence of solvation water molecules and neighbouring amino acids leads 
to a substantial screening of the protein/ligand hydrogen bond interactions, which can 
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be accurately quantified using DFT techniques. Despite the approximate nature of the 
methods and the hmited statistics available, AAG values calculated taking these effects 
into account show a much better agreement with the experimentally measured potencies of 
these inhibitors. Finally, the simulations reveal the presence of intermittent hydrogen-bond 
interactions between the inhibitors and the Lys89 residue, which have not been identified 
in the crystallographic studies performed to date. As Lys89 is a CDK2 specific residue, 
while in CDK4 a much shorter Thr residue is present in position 89, the existence of such 
an interaction offers an explanation for the observed, yet so far not understood specificity 
of some of the inhibitors for CDK2 against CDK4. 

Taken together, our findings reveal new factors underlying the relationship between 
the structure and activity of CDK inhibitors. They demonstrate that polarization effects 
arising from the specific hydration patterns in the ATP-binding pocket of CDK2 contribute 
to the differential potency of its ihibitors. Such polarization effects may be a general feature 
in other kinase-inhibitor complexes, which must be taken into account for rational drug 
design. Our results also demonstrate a contribution of intermittent hydrogen bonding 
interactions, not previously recognized in static crystallographic studies, to the potency of 
CDK2 inhibitors and their relative selectivity over CDK4. Finally, this work provides a 
generally applicable computational method which can be adapted to the investigation of 
other kinase-inhibitor structures, and which could be a useful tool in rational drug design. 

6 Computational Details 

Density Functional Theory calculations 

Quantum mechanical calculations of total energies were performed both with traditional 
plane-wave (PW) [53], and with novel linear-scaling (0(N)) [13, 14] implementations of 
Density Functional Theory (DFT), using the PBE gradient corrected exchange-correlation 
functional [54]. The Castep [55] code was employed for the PW calculations of smaller 
systems, using ultrasoft pseudopotentials [56] , and expanding the wave functions at the F- 
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point of the Brillouin zone up to a kinetic energy cut-off of 450 eV. The modelled molecules 
were placed in cubic simulation cells with edge length of 25A, which was sufficient to avoid 
any spurious interactions between the system and its periodicalUy repeated images in all 
cases considered. The computing cost of the PW approach increases with the cube of the 
number of atoms and becomes prohibitively expensive for systems with more than a few 
hundreds of atoms. In contrast, 0(N) methods have recently been developed where the 
increase in computing cost is linear with the number of atoms, and allow us to extend the 
DFT calculations to much larger systems. 0(N) DFT calculations were performed using 
both the widely used Siesta code [13] and the newly developed Onetep code [14]. In 
both cases the interactions between electrons and nuclei were described by norm-conserving 
pseudopotentials [57,58]. Siesta calculations were performed by placing the systems in 
cubic supercells with edge lengths of up to 65 A, avoiding any spurious interactions be- 
tween the simulated system and its periodically repeated images. The employed basis set 
consists of a DZP basis of localized atomic orbitals which were variationally optimized to 
accurately describe hydrogen bonds in biomolecular systems [59] . Due to the atomic basis 
set, calculations of binding energies needed to be corrected from the so-called basis set 
superposition error (BSSE) using the counterpoise method of Boys and Bernardi [60]. In 
the Onetep code the DFT equations are solved using a minimal set of strictly localised 
functions which are optimised in situ in a basis set which is formally equivalent to a plane 
wave expansion [61]. Therefore, the same levels of accuracy and rapidity of convergence of 
traditional PW codes are achieved [62], and there is no superposition error to be corrected 
during the calculation of binding energies [63]. The Onetep calculations were performed 
using a kinetic energy cut-off of 600 eV and placing the systems in a cubic simulation 
supercell with edge length of about 80 A. In all calculations charge neutrality of the sys- 
tems was imposed by protonation or deprotonation of residue groups of the protein models 
which were not involved in any direct interaction with the inhibitors. 

We note that our calculations provide an important comparison between a well es- 
tablished, but computationally costly, DFT implementation, and two recent linear-scaling 
methods for the quantum mechanical treatment of very large systems (up to thousands of 
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atoms) based on two very different mathematical formalisms. Our results show that the 
differences in the computed binding energies between the three different codes are less than 
1.5 kcal/mol for systems including about 1000 atoms (see Section 4.1), i.e. within the error 
bar associated with the other approximations employed (supercell method, pseudopoten- 
tials, exchange-correlation functionals) . For instance, the binding energy values of one of 
the inhibitors calculated with the Castep, Siesta, and Onetep codes are -16.2 kcal/mol, 
-16.1 kcal/mol, and -16.4 kcal/mol, respectively (another comparisons between the three 
codes have been presented in Section 3.1). Hence the results obtained with any of the 
codes are entirely equivalent at the standard accuracy level reached by DFT techniques. 
Nevertheless, to ensure full consistency of the presented results, the reported values of the 
differences in binding energies are the ones calculated with the Onetep code. The large 
volume calculations necessary at the exploratory and set-up stages of the work (concerning 
size convergence, tests on charged states, choice of protonation sites, etc.) were performed 
using the Siesta code. 

Classical Force-field Simulations 

The Amber 7.0 [64] suite of programmes was used for all the classical force-field simu- 
lations. Point charges on the atoms of the inhibitors were calculated as a best-fit to the 
electrostatic potentials computed within the DFT formalism in a 1.0 A thick spherical 
region outside the Van der Waals radii of the atoms. All other parameters for the in- 
hibitors were determined using the Antechamber facility making use of the gaff force-field 
provided. Solvation free energies were calculated within the standard pairwise Generalised 
Born model, taking into account entropic contributions using the the LCPO method to 
calculate the molecular surface area. The classical simulations with explicit solvent were 
carried out using the ff99 [65] force field based on the parameters of Cornell et al. [66], and 
the T1P3P water solvent. The models for the classical simulations were prepared by reading 
into the Leap module of Amber the crystal structures of the protein/inhibitor complexes 
after having stripped all crystallographic water molecules and counterions present. Each 
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model was then protonated, solvated in a tetragonal box with edge lengths ensuring a min- 
inum 20 A distance from the system and its periodically repeated images, and neutralised 
by adding chloride counterions. The dynamical simulations were performed following a 
standard three-step equilibration procedure: (i) The system was first fully minimised to 
release any unwanted steric clashes, and then the solvent was equilibrated in a 0.015 ns run 
slowly increasing the temperature to 300 K at the constant pressure of 1 atm, keeping the 
solute frozen, (ii) A series of 5 minimisation runs was then performed, in which the har- 
monic constraints on the solute were progressively released and eventually eliminated, (iii) 
Subsequently, the system was equilibrated in a 0.05 ns run to reach conditions of constant 
temperature (300 K) and pressure (1 atm). After equilibration, we carried out the produc- 
tion run for a time which varied from 0.5 to 4.0 ns, depending on the system considered. 
Only the production runs were taken into account in the analyses of the hydrogen bond 
patterns (Section 5.1). We used an Ewald cut-off sphere of 12.0 A and a timestep of 2 fs for 
the integration of the equations of motion, employing the SHAKE algorithm to keep fixed 
all bonds involving H atoms. A harmonic constraint of 5.0 kcal/mol/A^ was kept on the 
amino acids immediately before and after any protein sequences unavailable in the crys- 
tal structure (typically in the region from residue 36 to residue 43 of the CDK2/inhibitor 
complexes) . 

Van der Waals dispersion interactions 

The standard DFT technique employed systematically underestimates direct interactions 
due to dispersion forces. We thus carefully checked that the dispersion interactions between 
the protein and the ligands in the binding pocket are roughly the same for all inhibitors 
considered, so that they do not contribute significantly to the computed free energy differ- 
ences AAG. An estimate of the maximum dispersion interactions was obtained from the 
values of the Van der Waals energies of the minimised PL complexes and the isolated P 
and L systems in the same atomic configurations as in the PL complex, computed with 
the classical Amber force field in an implicit solvent. For the NU2058, NU6027, SU9516, 
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9d-NU6027 and the NU6102 inibitors, we compute net Van der Waals interaction energies 
of 41.4, 42.0, 40.1, 54.2 and 55.8 kcal/mol, respectively. These large values are consistent 
with other quantum chemical and empirical calculations of dispersion energies in aromatic 
systems [47,48]. We note, however, that the dispersion interactions are considerably af- 
fected by the brownian motion of the ligand and of the residues within the binding pocket in 
water solution at 300 K. We therefore computed the average of the Van der Waals binding 
energies over the configurations of the PL complexes during the dynamics at 300 K in an 
explicit solvent for the most weakly and most strongly bound inhibitors, i.e. SU9516 and 
NU6102. The differences AE^cLwiPL) — AE^^wiP) ~ ^Evdw{L) between those inhibitors 
were found to be less that ~4 kcal/mol, which is within the error bar of the method. We 
therefore conclude that direct dispersion interactions do not contribute in a substantial 
way to the different potencies among the inhibitors, so the differential values of binding 
energy can be safely computed within the DFT scheme. 

Scoring Functions calculations 

Docking simulations were carried out using a recently introduced quantum stochastic tun- 
nelling docking method [23,67]. This is a very efficient hybrid optimisation method that 
allows for the quantum and stochastic tunnelling through high energy barriers and the non- 
local exploration of the potential energy surface. Two hundred simulations were performed 
for each ligand-protein complex under the same conditions as described in [23], using the 
PLP scoring function [68] to represent the potential energy surface. A large number of 
binding modes were generated, which were then re-scored using the Bohm [69] and Screen- 
Score [70] scoring functions. The free energy of binding (for each scoring function) of 
each ligand-protein complex reported corresponds to the best energy found amongst all 
generated binding modes within 1.0 A RMSD of the binding mode visible in the crys- 
tal structure. This precaution ensured that the best estimate of the free energy binding 
corresponded to the experimentally observed binding mode. 
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Tables 



Inhibitor 


Ki (/iM) exp. 


AE,(HB) 


A^,(7A) 




AAG 


AAG exp. 


1 


(1.2±0.3)E+1 


-20.0 


-18.3 


19.4 


0.0 


0.0 


2 


(1.3±0.2) 


-18.5 


-18.4 


21.4 


+ 1.6 


-1.4 


5 


(3.1±0.6)E-2 


-14.6 


-15.1 


19.4 


+2.9 


-3.7 


4 


(2.4±0.8)E-2 


-37.1 


-43.4 


41.4 


-3.4 


-3.8 


3 


(6.0±0.5)E-3 


-43.3 


-48.2 


32.7 


-14.1 


-4.7 



Table 1: Binding energies, solvation energies, and relative binding free energy differences 
(all in kcal/mol) of five CDK2 inhibitors calculated at the DFT level of theory compared 
to the experimentally measured inhibition constants and relative difference of free energy 
of binding (in italics). The binding energy values are obtained starting from the avail- 
able crystal structures after geometry optimisation of the hydrogen bond distances of the 
protein/ligand complexes. 
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Inhibitor 


PLP 


Bohm 


ScreenScore 


AAG exp. 


1 


0.0 


0.0 


0.0 


0.0 


2 


+2.1 


+0.5 


-0.3 


-1.4 


5 


-7.2 


-1.5 


-1.0 


-3.7 


4 


-35.5 


-1.4 


-1.4 


-3.8 


3 


-37.0 


-3.1 


-3.7 


-4.7 



Table 2: Relative differencies in the free energies of binding calculated with three different 
sets of scoring functions after extended docking simulations, compared with the experimen- 
tal values (in italics). PLP values are in arbitrary energy units, Bohm and ScreenScore 
values are in kcal/mol. 



Inhibitor 


AE,(HB) 


A^,(7A) 


AEg{7A dry) 


A riwat 


1 


-16.2 


-12.5 


-17.4 


10.7 


2 


-17.3 


-13.8 


-18.5 


11.4 


5 


-50.3 


-34.3 


-51.3 


9.7 


4 


-57.5 


-36.7 


-59.1 


15.4 


3 


-74.6 


-63.6 


-81.7 


19.4 



Table 3: Binding energies (kcal/mol) calculated at the DFT level of theory after MD 
simulations and geometry minimisation of the protein/ligand complexes using the Amber 
force-field, and DFT optimisation of the hydrogen bond distances, considering three models 
of the binding pocket as described in the text. Section 4.1., as well as the contribution to 
the binding free energy due to solvation, AG^^'J*. 
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Inhibitor 


H-bond 


H-bond 


length (A) 


length (A) 


t 




(x-ray) 


(MD) 


(x-ray) 


(DFT) 


(%) 


1 


E81(0) 


E81(0) 


2.73 


2.86 


100 




L83(N) 


L83(N) 


3.00 


3.61 


100 




L83(0) 


H84(0) 


2.96 


2.95 


100 


2 


E81(0) 


E81(0) 


2.79 


2.81 


100 




L83(N) 


L83(N) 


2.95 


3.12 


100 




L83(0) 


L83(0) 


2.83 


2.85 


100 


5 


E81(0) 


E81(0) 


3.00 


2.77 


100 




L83(N) 


L83(N) 


2.94 


2.92 


100 




L83(0) 


L83(0) 


4.75 


3.06 


100 




- 


K89(NZ) 




2.77 


20 


4 


E81(0) 


E81(0) 


3.03 


2.79 


100 




L83(N) 


L83(N) 


3.24 


3.54 


100 




L83(0) 


L83(0) 


2.92 


3.07 


100 




D86(OD2) 


D86(ODl,OD2) 


2.79 


2.70 


100 






K89(NZ) 




2.53 


27 


3 


E81(0) 


E81(0) 


2.97 


2.76 


100 




L83(N) 


L83(N) 


3.15 


3.27 


100 




L83(0) 


L83(0) 


2.97 


2.99 


100 




D86(N) 


D86(N) 


3.74 


3.18 


21 




D86(OD2) 


D86(ODl,OD2) 


2.72 


2.58 


21 






K89(NZ) 




2.65 


26 



Table 4: Hydrogen bonds visible in the crystal structures (x-ray) and during force field MD 
simulations (MD). Donor-acceptor pairs and hydrogen bond distances after DFT optimi- 
sation are reported together with the time percentage for which each bond is present in the 
MD simulations. Residues making hydrogen bonds not visible in the x-ray structure are 
reported in italics. ODl and 0D2 are the oxygen atoms of the carboxyl group of aspartic 
acid; NZ is the nitrogen atom of the NHjj" group of lysine; O and N are the peptide oxygen 
and nitrogen atoms of the indicated residues. 
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Inhibitor 


AE,(7 A)/AE,(HB) 




A T?tri,m 
9 




A 771*^,86,89 

9 


1 


0.772 


-16.2 








2 


0.798 


-17.3 








5 


0.682 


-17.6 




-50.3 




4 


0.638 


-12.8 


-23.5 




-57.5 


3 


0.853 


-18.3 


-42.0 


-47.0 


-74.6 



Table 5: Ratios between the binding energies calculated in the HB and 7 A models and 
binding energy values (kcal/mol) of systems composed by the inhibitors and different sub- 
sets of amino acids, as described in the text, Section 4.1. 



Inhibitor 


AAG 


AAG exp. 


^AAG/ksT 




1 


0.0 


0.0 


1.0 


1.0 


2 


-0.6 


-1.4 


3.8 E-1 


(1.1 ± 3.2)E-1 


5 


-4.9 


-3.7 


3.5 E-4 


(2.6 ± 0.8)E-3 


4 


-3.6 


-3.8 


2.9 E-3 


(2.0 ± 0.8)E-3 


3 


-5.3 


-4.7 


1.8 E-4 


(5.0 ± 1.3) E-4 



Table 6: Computed differences of binding free energy (kcal/mol) and relative potencies 
compared with experimental values (in italics) obtained from the measured inhibition con- 
stants. 
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Figure captions 



Figure 1: The five CDK2 inhibitors considered in this study, NU2058 (1), NU6027 (2), 
NU6102 (3), 9d-NU6027 (4), SU9516 (5); the natural CDK2 hgand, ATP (6); and stau- 
rosporine (7). 

Figure 2: Convergence with CDK fragment size of binding energy values (kcal/mol) calcu- 
lated using two different 0(N) DFT codes for the case of staurosporine bound to the ATP 
pocket of CDK2. 

Figure 3: Comparison between the position of selected solvation water molecules visible in 
the crystal structure of the CDK2 / ATP complex (a) and the position of water molecules 
with long residence time in an MD simulation of the same system (b). Corresponding 
molecules are highlighted in green, and the hydrogen bonds made by ATP in the binding 
cleft are shown with dotted lines. In (c) and (d) a comparison is shown between the planar 
network of water molecules and the position of the ATP ligand within the binding cleft of 
CDK2. 
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Figure 4: Evolution of the hydrogen bond distance between inhibitor 3 and the peptide 
nitrogen of Leu 83 during an MD dynamical simulation, compared with the distance cal- 
culated at the DFT level (horizontal line). 



Figure 5: Binding modes of inhibitors 1 (a) and 5 (b) in the ATP binding cleft of CDK2 
as obtained from MD simulations. The formed hydrogen bonds are indicated with dotted 
lines. 



Figure 6: Evolution of the distances of hydrogen bonds formed between inhibitors 4 (left) 
and 3 (right) and residues D86 (top) and K89 (bottom) of CDK2. 



Figure 7: Evolution of the distance of the hydrogen bond formed between inhibitor 5 and 
residue K89 of CDK2. 



Figure 8: A water molecule (indicated with an arrow) bridging the NH3"'' group of K89 
with the peptide O atom of H84 near the SU9516 inhibitor. The calculated binding energy 
decreases by 5 kcal/mol when this single water molecule is included in the DFT total 
energy calculations (see text, Section 4.1). 
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L. Heady et al, Figure 1 
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H-bonds only Residues within 5 A Residues within 7A Residues within lOA 




SIESTA 32.22 37.40 39.59 39.46 

ONETEP 32.38 38.88 40.59 40.47 



L. Heady 



et al, Figure 2 
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L. Heady et al, Figure 3 
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L. Heady et al, Figure 4 
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L. Heady et al, Figure 5 
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L. Heady et al, Figure 6 
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Time (ns) 



L. Heady et al, Figure 7 
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L. Heady et al, Figure 8 
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